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ABSTRACT 

We suggest a method for statistical tests which does not suffer from a posteriori manipulations with 
tested samples (e.g. cuts optimization) and does not require a somewhat obscure procedure of the penalty 
estimate. The idea of the method is to hide the real sample (before it has been studied) among a large 
number of artificial samples, drawn from a random distribution expressing the null hypothesis, and then 
to search for it as the one demonstrating the strongest hypothesized effect. The statistical significance of 
the effect in this approach is the inverse of the maximal number of random samples at which the search 
was successful. We have applied the method to revisit the problem of correlation between the arrival 
directions of ultra-high energy cosmic rays and BL Lac objects. No significant correlation was found. 

Subject headings: catalogues - cosmic rays - methods: data analysis - methods: statistical 



1. INTRODUCTION 

Communications about effects detected at a marginally 
significant level constitute a considerable fraction of all 
scientific results. The scientific society usually treats 
such communication with skepticism. Indeed, too many 
marginally significant effects have not withstood the data 
accumulation. 

High energy astrophysics gives a number of instructive 
examples of searches for marginally significant effects. In- 
deed, there are many detection of particles or transient 
gamma-ray events whose sources (i.e. objects known from 
observations at other wavelenghts) are unknown. This 
stimulates intensive searches of various correlations be- 
tween different classes of events and objects. For example, 
there is a number of works reporting detections of cor- 
relation between locations of gamma-ray bursts (or their 
sub-samples) and various objects: galaxy clusters (Kolat 
& Piran 1996), Galactic plane (Belli 1997), and the local 
galactic arm (Romberg, Kurt, & Tikhomirova 1997). None 
of these results has been confirmed. Another similar area is 
ultra-high energy cosmic rays (UHECRs) and searches for 
their hypothetic sources. A claim of significant autocorre- 
lation in the arrival directions of UHERCs detected by the 
Akeno Giant Air Shower Array (AGASA) (Hayashida et 
al. 1996; Takeda et al. 1999) motivated searches of cross- 
correlations between UHECRs and various astrophysical 
objects. Particularly there were reported statistically sig- 
nificant cross-correlation signals between UHECRs and BL 
Lac objects (Tinyakov & Tkachev 2001, hereafter TT01, 
but see Evans, Ferrer & Sarkar 2003), super-galactic plane 
(Uchihori et al. 2000), radio-loud compact quasars (Vir- 
mani et al. 2002), highly luminous, bulge-dominated galax- 
ies (presumably, nearby quasar remnants, Torres et al. 
2002) and Seyfert galaxies (Uryson 2004). 

The reason for abundance of detected correlations is 
quite evident: a number of various possible effects, which 
have been searched for with statistical methods, is large 



and it is not surprising that some of them demonstrate a 
marginally significant signal just by chance. The situation 
is even worse because typically a probed effect is somewhat 
uncertain and the researcher tries different versions of the 
hypothesis, varying parameters and applying various cuts 
to the data samples. This means that the researcher per- 
forms a number of tests of the same effect which arc neither 
independent nor completely dependent. These numerous 
trials, again, increase the probability to observe a signal 
in one of the trials by chance and the analysis of this kind 
of bias is difficult. We illustrate this problem in Section 4 
and in Fig. 1. 

Does it mean that one should reject the possibility to 
manipulate the data samples with cuts and parameters? 
A blind test when all cuts and parameters in a statisti- 
cal test have been set and motivated a priori, is a good 
style. But there are many situations when such a priori 
definition of a test is very problematic and the investiga- 
tor sometimes really needs the rights to vary the testing 
procedure and to see what will happen. 

In principle, the researcher can account for these nu- 
merous trials using random samples, representing the null 
hypothesis. Often it is done in the following way (see 
e.g. TT01). The investigator prepares a large array of 
N random samples, d 1 , and does the same estimate of the 
effect for each of these samples as he does for the real 
sample, d°, in each statistical trial. Let a statistic asso- 
ciated with a confidence of the effect (e.g. 1 — p, where 
p is the probability to obtain the result from null hy- 
pothesis) be Sj — F{d l 1 Cj), where Cj is a set of cuts 
and/or analysis parameters from the universe C of all 
cuts and parameters. First, one finds the maximum for 
the real sample S^ ax = max{F(d°, Cj)} which is reached 
at j = jo- Then, one performs similar search for ran- 
dom samples 5^ ax = max{F(cf, Cj)}. The significance 
can be defined as the fraction of random samples sat- 
isfying the condition S^^ > S!jJ,„. This value differs 
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from the straightforward (uncorrected for numerous tri- 
als) estimate of the significance by the "penalty factor" 
tf(SL« > S° maK )/N(Si > S° max ), where S* = F(d\C jo ) 
and N(») is the number of samples satisfying a condition •. 
This procedure is sufficient if (i) the investigator follows 
the above procedure precisely; (ii) the investigator does 
not use the o posteriori information on the real sample for 
the planning of the investigation strategy. 

We would like to notice that both conditions are not so 
easy to satisfy once the investigator studied the real sample 
and feels which combination of cuts or model parameters 
will provide the most significant signal. Then he can find 
the most favorable trial intuitively, avoiding a large num- 
ber of unfavorable ones. In other terms, the investigator 
can introduce a bias in the choice of Cj and overestimate 
the significance of the effect using a posteriori knowledge. 
We should emphasize that the investigator can introduce 
such bias not deliberately. This is a serious disadvantage 
of the approach. Such kind of bias is difficult to trace and 
we consider this method to be insufficiently credible. 

In this work, we suggest a new approach giving a simple 
way of avoiding this "pressure" of the o posteriori infor- 
mation. The investigator can hide the real sample inside a 
large array of random null hypothesis samples prior to any 
data analysis. Now we have a single array of N samples d l . 
One of them is real (the investigator does not know which), 
other are random. The problem is thus inverted: instead 
of confirming the hypothesis using the real sample, the 
investigator must find the real sample in the array using 
the hypothesis that the verified effect exists, i.e. find z max 
corresponding to the maximum value of Sj = F(d l ,Cj). 
This is a blind test: the investigator does not know where 
is the real sample and he can feel free to perform numer- 
ous trials. If the investigator finds the real sample, the 
significance of the effect is just the inverse of the number 
of samples in the array. 

An alternative to our method is the cross-validation 
method, where the search for an effect is carried out on 
a fraction of the data sample. Therefore it is less sensi- 
tive. Below we demonstrate our method applying it to the 
problem of UHECRs - BL Lacs correlation. 

2. PROCEDURE 
2.1. Catalogs 

We used the AGASA sample of UHECRs with 58 events 
above 4 x 10 19 eV and a catalog of Veron-Cetty & Veron 
(2003) containing 876 BL Lac objects. We do not com- 
bine the AGASA sample with the data from other experi- 
ments because other samples are smaller and problems as- 
sociated with the non-uniform structure of a joint sample 
would overweight the statistical gain. The BL Lac cata- 
log has been cut in declination at —10° and was subject 
to various brightness cuts. We also tried a sub-catalog of 
confirmed BL Lacs which includes 491 objects. Actually it 
is not clear which catalog is more relevant (TT01 used a 
confirmed sub-catalog) and therefore we try both variants. 

2.2. Null hypothesis and random samples 

Null hypothesis in our case is just the isotropic distribu- 
tion of arrival directions of UHECRs convolved with the 
AGASA exposure function. The latter is a function of 
declination and does not depend on right ascension. This 



provides a simple way to prepare random, null-hypothesis 
samples avoiding possible uncertainties in the latitude ex- 
posure function: to sample the right ascension uniformly 
keeping the actually observed declination for each event. 
We, nevertheless, have dispersed the declinations of UHE- 
CRs by ±3° around their real values in order to destroy 
a possible small-scale latitude correlation, if the latter ex- 
ists. Such small dispersion does not distort a much wider 
exposure function. 

When performing the test we have distributed roles: one 
of the coauthors acts as an "investigator" , another plays a 
role of "examiner" . Examiner has prepared an array of 99 
random samples as described above and inserted the real 
sample into the array keeping the sequential real sample 
number in secret from the investigator. He did not partic- 
ipate in the data analysis until the investigator made his 
final choice. 

2.3. Measure for the correlation signal 

We used usual two-point correlation function counting 
the number n of UHECRs within angle 5 from any BL 
Lac of a given catalog. Then we compare this number 
with expectation n e for the null hypothesis: 

n e = Ar BLj V U i _ cos( _ iQo) ! (1) 

where Abl is the number of BL Lacs in the catalog, 
N v = 58 is the number of UHECRs, -10° is the decli- 
nation cut on BL Lacs. Note that this expectation implies 
an isotropic distribution of at least one sample. This is 
not the case because the AGASA sample has a latitude 
anisotropy and BL Lac catalog is anisotropic respectively 
to the galactic plane (selection effect) and the cosmologi- 
cal large-scale structure. A more accurate estimate differs 
from that given by equation (1) by a factor 
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where is the AGASA exposure function. The exposure 
function depends on particle energy and is hardly known 
better than one can extract from the latitude distribution 
of detected UHECRs. Takeda et al. (1999) use a poly- 
nomial fit to the observed latitude distribution of events 
above 10 19 eV. We prefer to use the observed distribution 
of the available AGASA sample (above 4 x 10 19 eV) in a 
form of histogram in cos 9 with the bin width 0.1 since this 
is a simplest option that can be easily reproduced. 

Factor F depends on the BL Lac catalog and there- 
fore on cuts. According to our estimates with equation 
(2), F is close to 1 for radio-bright objects and ~ 1.2 
for optically-bright objects (probably due to anisotropy 
caused by galactic absorption). We introduce the measure 
of the signal, p (which depends on S and cuts in the BL 
Lac catalog), as the probability to sample n or more hits 
from the Poisson distribution at expectation Fn e . 

Note that for autocorrelated samples the distribution of 
n is not Poisson, therefore this measure is not exact. In 
order to correct this probability for the actual autocor- 
related distribution of BL Lacs on the sky, we perform 
Monte-Carlo simulations using a large number of random 
UHECR samples. The maximal disagreement between the 
Poisson and Monte-Carlo probabilities is by a factor of 2. 
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Thus, we use the Poisson probability, p, for preliminary 
estimates and recalculate the probability for the leading 
samples (given in Table 1) with Monte-Carlo simulations. 

TABLE 1 

Samples of UHECRs demonstrating the most significant 
correlation with bl lacs 

R or O a UP iV obj c C* r or C* D d 5 s p x 10 4 1 

(Jy or V) (dcg) 

All quasars 



R 


90 


256 


0.04 


2 


2.6 


R 


40 


139 


0.16 


3 


3.2 


R 


11 


35 


0.79 


2 


5 





90 


153 


17.5 


2 


3.12 






Confirmed BL Lacs 
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11 
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0.79 


2 


1.1 


R 


4 


197 


0.02 


1.5 


3.47 


R 


90 


6 


0.79 


3 


8 


O 


4 


118 


18 


1.5 


1.15 



a Cut applied in radio, R, or optical, O, brightness 

b Identification number of a sample giving the strongest correlation 

signal 

c Number of objects passing the cut 

d Optimal cut in 6 GHz radio flux or visual magnitude 

e Optimal correlation angle 

f Significance level 

3. SEARCH FOR THE BEST-CORRELATING SAMPLE AND 
ITS RESULTS 

Optimizing cuts in all existing parameters we can fit a 
BL Lac catalog to any set of locations in the sky so that it 
will demonstrate a highly significant correlation (see Sect. 
4). Therefore, if our objective is to find the real sample, we 
have to try the most relevant cuts. The apparent radio- or 
optical brightness of objects (represented in the catalog by 
their observed radio flux density measured in Jy and the 
visual magnitude V) seem to be good indicators of par- 
ticle acceleration to ultra-high energies. To avoid "over- 
optimization" of random samples in two-dimensional scan, 
we performed two separate scans: 

1 . Wc optimized cut C r in the 6 GHz radio flux within 
the limits 0.01 Jy < C r < 2 Jy, varying it with the 
step 0.1 in decimal logarithm. No cuts in optical 
brightness was applied. This scan is marked with 
letter R in Table 1. 

2. We optimized cut C a in visual magnitude within 
the range from V = 12 to V = 24 with the step 
AV = 0.5. No cuts in radio flux was applied and 
we excluded objects with no data on their radio 
brightness. This scan is marked with letter O in 
Table 1. 

The proper correlation angle S is somewhat uncertain. 
The most significant correlation should not certainly ap- 
pear at a correlation angle equal to la experimental error 
(the latter depends on the particle energy). If UHECRs 
are charged, then the correlation could appear at S cor- 
responding to a typical angle of particle deflection. We 
optimized S between 1?5 and 5° with the step 0?5. The 
samples that give the most significant correlation are listed 
in Table 1. In addition, we also tried a scan over the intrin- 
sic radio luminosity as was done in TT01. The strongest 



effect gave sample #11: p = 4 x 10~ 4 with 25 intrinsically 
brightest BL Lac objects and S = 3° . 

With these results at hand, the investigator had to make 
a choice concerning the real sample. All best samples (ex- 
cept #4) have a reasonable value of optimal 5 (2° and 3°), 
which is close to the angular resolution of AGASA of 2?3. 
Finally, the "investigator" used sample #11 as the first 
choice. The second option was sample #90. 

The second task is the test for autocorrelation of the 
UHECR arrival directions. It was performed with the 
same array of random samples before the "investigator" 
was informed about the results of his choices in the first 
test. The autocorrelation signal is estimated in a similar 
way as described above for the cross-correlation signal: 



_ N V (N V -1) l-cosJ _ Sf^ flgi) 

2 l-cos(-10°)' N V (0 ' [6) 

where factor F — 1.4. 

Now, sample #67 showed maximum signal of p = 
0.5 x 10~ 3 at 5 = 2?5 (8 hits). The second sample showing 
strong autocorrelation was #30 with p = 1.7 x 10 -3 at 
6 = 2?1. The choice of the investigator was #67. 

The real observed sample of UHECRs had sequential 
number #67. Therefore the test at 99 per cent confi- 
dence level was unsuccessful for UHECRs- BL Lacs corre- 
lation and successful for UHECRs autocorrelation. Then 
we checked sample #67 for the cross-correlation with BL 
Lacs by varying C r and have not found any significant 
signal. 



4. INTERPRETATION OF THE RESULTS 

We can confirm that the autocorrelation signal in 
AGASA sample with the given energy threshold has a sig- 
nificance of at least 10~ 2 . To find the significance level we 
would have to vary the size of the random array and to 
find the limit when we are able to find the real sample. 
This objective is beyond the scope of this work. Probably, 
according to the correlation signal in the second best sam- 
ple, the significance is around 3 x 10~ 3 in agreement with 
Finley & Westerhoff (2004). One should notice, however, 
that this result refers to a specific sample with the energy 
cut of 4 x 10 19 eV (see Finley & Westerhoff 2004, for the 
discussion). To estimate the significance of real autocor- 
relation one has to perform the same procedure with an 
untruncated sample of UHECRs varying the energy cut in 
a reasonable range. 

Our negative result on cross-correlation with BL Lacs 
does not mean that we have found a quantitative dis- 
agreement with the results of TT01. They have found 
a positive signal with an another catalog of the confirmed 
BL Lac objects. Their cuts were: z > 0.1 or unknown, 
C r = 0.17 Jy, C = 18 m . At these cuts the positive sig- 
nal still exists at p = 1.9 x 10~ 2 and S = 2?5 (with factor 
F = 1.24, see Eq. 2) and the real sample #67 is the second 
significant among 99 random samples (having similar sig- 
nificance with three other samples including sample #11). 
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Fig. 1. — The fraction of 10 4 simulated random UHECRs sam- 
ples, rj, demonstrating a higher significance level for the "correlation 
signal" with the BL Lac catalog of Veron-Cetty & Veron (2003) (876 
objects) than a given value p for different cut optimization. From 
lower to higher curves: 1 - no cuts optimization with CV = 0.2 Jy, 
<5 = 2? 5, no cuts in optical brightness; 2 - optimization in C r with 
5 = 2? 5 and no cuts in optical brightness; 3 - optimization in both 
Cr and Co with <5 = 2? 5; 4 - optimization in Cr, Co, and <5. 

We just demonstrated that using the most straightfor- 
ward assumptions, blindly, one can hardly find the corre- 
lation signal. Regarding more specific cuts, like in TT01, 
one meets a problem of interpretation of the signal whether 
it is real or is just a consequence of cuts optimization 
(see also Evans, Ferrer & Sarkar 2003, 2004). The claim, 
that a given cut was motivated independently rather than 
was optimized, is not convincing unless the motivation has 
been done a priori. 

Now let us demonstrate how the multiple cuts optimiza- 
tion can actually mimic a significant signal. In this demon- 
stration we use 10 4 random UHECRs samples prepared 
as described in Sect. 2.2 and the BL Lac catalog with 
cuts, optimized for each random sample. Fig. 1 shows 
the fraction of random samples r\ which demonstrated a 
"significance of correlation" higher than p, after cuts op- 
timization. If we fix all the cuts (curve 1), then there is 
an approximate agreement between rj and p. If we opti- 
mize one cut, C T , then we obtain r\ a few times greater 
than p (actually, the ratio r\jp can be interpreted as the 
penalty factor discussed above). With two cuts optimiza- 
tion, adding a scan over visual magnitude, the ratio rj/p 
reaches almost two orders of magnitude and one out of 5 
samples demonstrates p < 0.01. If we add an optimiza- 
tion for the correlation angle 5, then every third random 



sample demonstrates a "significance" of 10 2 , every tenth 
gives p < 10~ 3 , and one out of thousand gives p = 10~ 6 ! 

5. SUMMARY 

We presented a method of a blind search for a hypo- 
thetic effect where various trials with different sub-samples 
or model parameters do not affect the stated significance 
level. We believe that a tradition to use this method, when 
possible, would dramatically reduce the number of uncon- 
firmed claims of marginally significant effects. The method 
is especially useful when: (i) there is a clear null hypothe- 
sis and a way to prepare random samples representing it; 
(ii) there exists a convenient measure of the statistical sig- 
nificance of the effect; (iii) the effect is uncertain in some 
respects, otherwise a test with the blind a priori formula- 
tion (i.e. it is a priori clear which data should be used and 
how the effect should look) is sufficient. Such problems as 
searches for cross-correlation between two classes of astro- 
physical objects usually satisfy all three conditions. We 
would like to emphasize that the proposed method is, in 
principle, applicable in any field of science. 

In this work, we performed a demonstration for only 
one size of the array of random samples. To find the sig- 
nificance level of the effect, one should make several trials 
with different array size starting from a larger one, then 
reducing its size until the real sample is found. The exam- 
iner should not disclose the real sample after unsuccessful 
trial. 

An effect detected with this method is credible because 
it ensures a researcher against unintentional overestima- 
tion of the significance. The only possible source of errors 
that can mimic a positive result is a wrong null hypothesis 
distinguishing random samples from the real sample. In 
the case considered in this paper, this could be for example 
a wrong exposure function of the UHECR detector. Oth- 
erwise, a positive result would have an explicit meaning: 
the chance that the effect does not exist is the inverse of 
the size of array of samples at the successful search. 

As an application of the proposed method, we analyzed 
a possible UHECR-BL Lac correlation. We found no sig- 
nificant correlation, but cannot claim, of course, that cor- 
relation does not exist. 

We are grateful to P. Tinyakov and I. Tkachev for use- 
ful discussions and to the anonymous referee for numerous 
useful suggestions. The work is supported by the RFBR 
grant 04-02-16987, the Academy of Finland grants 80750 
and 102181, the Jenny and Antti Wihuri Foundation, 
the Vilho, Yrjo and Kalle Vaisala, Foundation, and the 
NORDITA Nordic project in High Energy Astrophysics. 



REFERENCES 



Belli, B. M. 1997, ApJ, 479, L31 

Evans, N. W., Ferrer, F., & Sarkar, S. 2003, Phys. Rev. D, 67, 103005 
Evans, N. W., Ferrer, F., & Sarkar, S. 2004, Phys. Rev. D, 69, 128302 
Finlcy, C. B., & Westerhoff, S. 2004, Astroparticle Physics, 21, 359 
Hayashida, N., et al. 1996, PhRvLctt, 77, 1000 
Kolatt, T., & Piran, T. 1996, ApJ, 467, L41 

Romberg, B. V., Kurt, V. G., & Tikhomirova, Ya. Yu. 1997, Ap&SS, 
252, 465 

Takcda, M., ct al. 1999, ApJ, 522, 225 

Tinyakov, P. G., & Tkachev, I. I. 2001, JETP Lett, 74, 445 (TT01) 



Torres, D. F., Boldt, E., Hamilton, T., & Loewenstein, M. 2002, 

PhRvD, 66, 23001 
Uchihori, Y., Nagano, M., Takeda, M., Teshima, M., Lloyd-Evans, 

J., & Watson, A. A 2000, Astroparticle Physics, 13, 151 
Uryson, A. V. 2004, Astronomy Reports, 48, 81 
Veron-Cetty, M.-P., & Veron, P. 2003, A&A, 412, 399 
Virmani, A., Bhattacharya, S., Jain, P., Razzaque, S., Ralston, J. P., 

& Mckay, D. W. 2002, Astroparticle Physics, 17, 489 
Yoshiguchi, H., Nagataki, S., & Sato, K. 2004, ApJ, 614, 43 



